Skip to content

Further reduction in allocations (from callers)#54

Merged
sasha-s merged 1 commit intosasha-s:mainfrom
kevin-pan-skydio:kp-reduce-allocations-part-2
Mar 18, 2026
Merged

Further reduction in allocations (from callers)#54
sasha-s merged 1 commit intosasha-s:mainfrom
kevin-pan-skydio:kp-reduce-allocations-part-2

Conversation

@kevin-pan-skydio
Copy link
Copy Markdown
Contributor

@kevin-pan-skydio kevin-pan-skydio commented Mar 18, 2026

Continuation of the allocation-reduction work in dcbba57 (which replaced per-lock goroutines and channels with pooled pendingEntry structs and time.AfterFunc). This change tackles the remaining hot-path allocations: the stack-trace buffer allocated on every callers() call and the []stackGID holder slices grown in postLock.

What changed

  • stackBufPool -> callers() now draws its [50]uintptr backing array from a sync.Pool instead of make([]uintptr, 50) on every lock. The buffer pointer is threaded through postLock -> stackGID.buf -> postUnlock -> releaseStackBuf so it gets returned to the pool when the lock is released.
  • holdersPool -> the []stackGID slices stored in lo.cur[p] are recycled. When a mutex has no remaining holders the backing slice is returned to a pool; postLock pulls from this pool before allocating a new one.
  • copyStack for l.order -> because stacks are now backed by pooled buffers that get recycled, entries persisted in the lock-order map (l.order) need independent copies via copyStack() to avoid use-after-recycle bugs.
  • New concurrency tests -> TestManyReadersFewWriters (100 readers, 3 writers), TestConcurrentLockOrderDetection (20 goroutines triggering order violations), and TestDeadlockTimeoutTransientNoHolder (exercises the reschedule path when the deadlock timer fires during a transient no-holder window).

Benchmarks

goos: linux
goarch: amd64
pkg: github.com/sasha-s/go-deadlock
cpu: Intel(R) Xeon(R) Platinum 8488C
BenchmarkRegisterDeregister-8           10624075               114.5 ns/op             0 B/op           0 allocs/op
BenchmarkRegisterDeregister-8            9161024               116.8 ns/op             0 B/op           0 allocs/op
BenchmarkRegisterDeregister-8            9864206               120.8 ns/op             0 B/op           0 allocs/op
BenchmarkLockUnlock-8                    1418875               900.5 ns/op            24 B/op           1 allocs/op
BenchmarkLockUnlock-8                    1439679               758.5 ns/op            24 B/op           1 allocs/op
BenchmarkLockUnlock-8                    1505794               796.0 ns/op            24 B/op           1 allocs/op
PASS
ok      github.com/sasha-s/go-deadlock  23.167s

Prior benchmark:

goos: linux
goarch: amd64
pkg: github.com/sasha-s/go-deadlock
cpu: Intel(R) Xeon(R) Platinum 8488C
BenchmarkRegisterDeregister-8           11854124                98.67 ns/op            0 B/op          0 allocs/op
BenchmarkRegisterDeregister-8           11521474               106.2 ns/op             0 B/op          0 allocs/op
BenchmarkRegisterDeregister-8           11898174                99.43 ns/op            0 B/op          0 allocs/op
BenchmarkLockUnlock-8                    1273275               979.6 ns/op           448 B/op          2 allocs/op
BenchmarkLockUnlock-8                    1279135              1270 ns/op             448 B/op          2 allocs/op
BenchmarkLockUnlock-8                    1000000              1134 ns/op             448 B/op          2 allocs/op
PASS
ok      github.com/sasha-s/go-deadlock  20.605s

Test plan

  • All existing tests pass (go test ./...)
  • Benchmarks run with -benchmem -count=3 confirm the allocation reduction
  • New stress tests excercise high-concurrency reader/writer tracking and concurrent lock-order violation detection
  • TestDeadlockTimeoutTransientNoHolder validates no false-positive deadlock reports from the reschedule path

This change is Reviewable

@kevin-pan-skydio kevin-pan-skydio force-pushed the kp-reduce-allocations-part-2 branch from 1afa550 to 9bbaca9 Compare March 18, 2026 06:25
@sasha-s
Copy link
Copy Markdown
Owner

sasha-s commented Mar 18, 2026

code looks great!

remaining allocation (24 B/op) in BenchmarkLockUnlock. Do we know what it is? beforeAfter key alloc in preLock l.order?

Have you tried running stress tests under -race?

@kevin-pan-skydio
Copy link
Copy Markdown
Contributor Author

@sasha-s last remaining alloc is the stackGID entry that's appended to the holders slice when it needs to grow past its pooled capacity

@kevin-pan-skydio
Copy link
Copy Markdown
Contributor Author

Just ran the tests with -race, one test fails but it also fails on master. Going to see if it fails prior to my first allocations commit as well.

@kevin-pan-skydio
Copy link
Copy Markdown
Contributor Author

kevin-pan-skydio commented Mar 18, 2026

Okay yeah TestLockDuplicate also fails the race check at 7c2aeed "fix: use weak pointers in lock order map to prevent GC leak"

I'll try to fix this test now

@sasha-s
Copy link
Copy Markdown
Owner

sasha-s commented Mar 18, 2026

I'll try to fix this test now

Thank you.
We could ALs land this (clean, self contained) and fix -race separately

other changes:
1. add new unit tests
2. fix existing -race unit test failures
@kevin-pan-skydio kevin-pan-skydio force-pushed the kp-reduce-allocations-part-2 branch from 9bbaca9 to c7663db Compare March 18, 2026 17:55
@kevin-pan-skydio
Copy link
Copy Markdown
Contributor Author

Fixed!

Issue is the tests are writing and reading to Opts fields like Opts.DeadlockTimeout (not really something that happens in production) but sequential tests will do this and leaked go routines from previous tests will read Opts.* while Opts.* is being written to in new tests

Caching Opts.DeadlockTimeout locally so leaked goroutines don't actually read Opts.DeadlockTimeout again solves this

Also removed a few "restore" calls since this could cause races as well on Opts.* fields between subsequent tests.

@sasha-s
Copy link
Copy Markdown
Owner

sasha-s commented Mar 18, 2026

Thank you @kevin-pan-skydio

@sasha-s sasha-s merged commit c1cf6f8 into sasha-s:main Mar 18, 2026
6 checks passed
@sasha-s
Copy link
Copy Markdown
Owner

sasha-s commented Mar 18, 2026

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants